feat(runtime): strict issue resolution and silent terminal state fix#16
feat(runtime): strict issue resolution and silent terminal state fix#16heidi-dang wants to merge 47 commits into
Conversation
…commands and bump to v3.12.3
…skill to prevent fabrication.
… featuring advanced commit message generation and agent runtime enhancements.
…ment, and state ledger for enhanced control and auditability.
…ontract` hook for strict output enforcement, and update agent completion instructions.
…ent prompts into dedicated files.
…ion, and deterministic execution, replacing the dynamic prompt builder and adding reliability diagnostics.
…add `README_RELIABILITY_RUNTIME.md` for detailed reliability runtime documentation.
…type mismatches in tools
* fix(phase1): revert runtime file changes, keep prompt-only capability ports
- Revert src/hooks/runtime-enforcement/hook.ts to HEAD (runtime authority unchanged)
- Revert src/agents/runtime/state-ledger.ts to HEAD (runtime authority unchanged)
- Revert src/agents/runtime/tool-runner.ts to HEAD (runtime authority unchanged)
- Revert src/agents/oracle.ts to HEAD (Heidi baseline is already stronger)
- Revert src/agents/sisyphus.ts (flat file) to HEAD - Heidi's modular flow preserved
- Revert src/agents/builtin-agents.ts to HEAD - sisyphus-agent wiring preserved
- Hephaestus dir: prompt-only, imports hard-blocks/anti-patterns from Heidi's prompts module
- Sisyphus dir: prompt-only, imports hard-blocks/anti-patterns from Heidi's prompts module
- Add isGpt5_4Model/isGpt5_3CodexModel type guards to types.ts for Hephaestus dispatch
- Update doctor check: remove dynamic-agent-prompt-builder as forbidden (passive library)
- Update doctor check: loop guard deferred to separate runtime-only PR
Phase 0 changes (Atlas/Gemini/GPT verification wave) and Sisyphus/Hephaestus
prompt-layer capability improvements are preserved. No runtime changes in this PR.
* fix(phase1): completely eliminate builder from capabilities & unify ledger
- Removed dynamic-agent-prompt-builder imports from Sisyphus and Hephaestus
- Ported official wording/orchestration into src/agents/prompts/orchestration.ts
- Moved Agent/Skill/Tool/Category types into src/agents/types.ts
- Deleted dynamic-agent-prompt-builder and restored it to the doctor forbidden list
- Tightened runtime enforcement hook to scan current chat flow instead of global historical ledger
- Dropped generic phrase matching ('success') from enforcement checks
- Unused/shadow state-ledger from agents runtime deleted
- Unified state-ledger across agent and core runtime
* fix(phase1): harden truth model & completion paths
- Enhanced State Ledger schema (success, verified, changedState, stdout, sessionID)
- complete_task & query_ledger mapped to strictly verified, session-scoped truth
- execution-journal hook forwards full status booleans and stdout to ledger
- tool-contract hook strictly types metadata booleans and asserts ledger presence for state changes
- Purged prototype scaffolding (agent-runner, tool-runner, context-builder)
- Updated doctor checks and docs for Phase 1 closure
* fix(phase2): remove Hephaestus GPT-only exclusivity
- Removed 'Hephaestus is designed exclusively for GPT' block in no-hephaestus-non-gpt hook
- Removed requiresProvider limitations from shared and CLI model requirements for Hephaestus
- Added model resolution regression tests specifying Grok resolves properly for config
- Updated upstream capability doctor to strictly forbid the return of these GPT-only rule strings
* fix(phase3): wire truth model to live registry and enforce contract
- Wired DETERMINISTIC_TOOLS into active createToolRegistry
- Exposed execution, plan, and runtime enforcement hooks via createToolGuardHooks
- Enforced strict boolean contract (success, verified, changedState) on all deterministic tools
- Added unit tests for isolated query_ledger and complete_task behavior
- Updated upstream capability tracker
* fix: Add xai-usage-patch hook to correct negative input token display for xAI/Grok models by adding cache read tokens back to input.
* Refactor `execution-journal` hook to access tool arguments from output metadata, create `execution.jsonl` for journaling, and add truth model integration tests.
* feat: Strengthen tool contract enforcement by requiring verified ledger entries and adding runtime validation for message claims.
* fix(phase1): revert runtime file changes, keep prompt-only capability ports
- Revert src/hooks/runtime-enforcement/hook.ts to HEAD (runtime authority unchanged)
- Revert src/agents/runtime/state-ledger.ts to HEAD (runtime authority unchanged)
- Revert src/agents/runtime/tool-runner.ts to HEAD (runtime authority unchanged)
- Revert src/agents/oracle.ts to HEAD (Heidi baseline is already stronger)
- Revert src/agents/sisyphus.ts (flat file) to HEAD - Heidi's modular flow preserved
- Revert src/agents/builtin-agents.ts to HEAD - sisyphus-agent wiring preserved
- Hephaestus dir: prompt-only, imports hard-blocks/anti-patterns from Heidi's prompts module
- Sisyphus dir: prompt-only, imports hard-blocks/anti-patterns from Heidi's prompts module
- Add isGpt5_4Model/isGpt5_3CodexModel type guards to types.ts for Hephaestus dispatch
- Update doctor check: remove dynamic-agent-prompt-builder as forbidden (passive library)
- Update doctor check: loop guard deferred to separate runtime-only PR
Phase 0 changes (Atlas/Gemini/GPT verification wave) and Sisyphus/Hephaestus
prompt-layer capability improvements are preserved. No runtime changes in this PR.
* fix(phase1): completely eliminate builder from capabilities & unify ledger
- Removed dynamic-agent-prompt-builder imports from Sisyphus and Hephaestus
- Ported official wording/orchestration into src/agents/prompts/orchestration.ts
- Moved Agent/Skill/Tool/Category types into src/agents/types.ts
- Deleted dynamic-agent-prompt-builder and restored it to the doctor forbidden list
- Tightened runtime enforcement hook to scan current chat flow instead of global historical ledger
- Dropped generic phrase matching ('success') from enforcement checks
- Unused/shadow state-ledger from agents runtime deleted
- Unified state-ledger across agent and core runtime
* Refactor the truth model to prevent unverified bash commands from creating ledger entries and implement flow isolation in runtime enforcement to validate state claims against the current execution flow.
* fix(model-requirements): remove hardcoded gpt-5.3-codex constraints and allow non-gpt models for hephaestus
* fix(phase1): revert runtime file changes, keep prompt-only capability ports
- Revert src/hooks/runtime-enforcement/hook.ts to HEAD (runtime authority unchanged)
- Revert src/agents/runtime/state-ledger.ts to HEAD (runtime authority unchanged)
- Revert src/agents/runtime/tool-runner.ts to HEAD (runtime authority unchanged)
- Revert src/agents/oracle.ts to HEAD (Heidi baseline is already stronger)
- Revert src/agents/sisyphus.ts (flat file) to HEAD - Heidi's modular flow preserved
- Revert src/agents/builtin-agents.ts to HEAD - sisyphus-agent wiring preserved
- Hephaestus dir: prompt-only, imports hard-blocks/anti-patterns from Heidi's prompts module
- Sisyphus dir: prompt-only, imports hard-blocks/anti-patterns from Heidi's prompts module
- Add isGpt5_4Model/isGpt5_3CodexModel type guards to types.ts for Hephaestus dispatch
- Update doctor check: remove dynamic-agent-prompt-builder as forbidden (passive library)
- Update doctor check: loop guard deferred to separate runtime-only PR
Phase 0 changes (Atlas/Gemini/GPT verification wave) and Sisyphus/Hephaestus
prompt-layer capability improvements are preserved. No runtime changes in this PR.
* fix(phase1): completely eliminate builder from capabilities & unify ledger
- Removed dynamic-agent-prompt-builder imports from Sisyphus and Hephaestus
- Ported official wording/orchestration into src/agents/prompts/orchestration.ts
- Moved Agent/Skill/Tool/Category types into src/agents/types.ts
- Deleted dynamic-agent-prompt-builder and restored it to the doctor forbidden list
- Tightened runtime enforcement hook to scan current chat flow instead of global historical ledger
- Dropped generic phrase matching ('success') from enforcement checks
- Unused/shadow state-ledger from agents runtime deleted
- Unified state-ledger across agent and core runtime
* Refactor the truth model to prevent unverified bash commands from creating ledger entries and implement flow isolation in runtime enforcement to validate state claims against the current execution flow.
* fix(model-requirements): remove hardcoded gpt-5.3-codex constraints and allow non-gpt models for hephaestus
* fix(model-requirements): prioritize Claude in deep category and cleanup GPT-first bias
* fix(phase1): revert runtime file changes, keep prompt-only capability ports
- Revert src/hooks/runtime-enforcement/hook.ts to HEAD (runtime authority unchanged)
- Revert src/agents/runtime/state-ledger.ts to HEAD (runtime authority unchanged)
- Revert src/agents/runtime/tool-runner.ts to HEAD (runtime authority unchanged)
- Revert src/agents/oracle.ts to HEAD (Heidi baseline is already stronger)
- Revert src/agents/sisyphus.ts (flat file) to HEAD - Heidi's modular flow preserved
- Revert src/agents/builtin-agents.ts to HEAD - sisyphus-agent wiring preserved
- Hephaestus dir: prompt-only, imports hard-blocks/anti-patterns from Heidi's prompts module
- Sisyphus dir: prompt-only, imports hard-blocks/anti-patterns from Heidi's prompts module
- Add isGpt5_4Model/isGpt5_3CodexModel type guards to types.ts for Hephaestus dispatch
- Update doctor check: remove dynamic-agent-prompt-builder as forbidden (passive library)
- Update doctor check: loop guard deferred to separate runtime-only PR
Phase 0 changes (Atlas/Gemini/GPT verification wave) and Sisyphus/Hephaestus
prompt-layer capability improvements are preserved. No runtime changes in this PR.
* fix(phase1): completely eliminate builder from capabilities & unify ledger
- Removed dynamic-agent-prompt-builder imports from Sisyphus and Hephaestus
- Ported official wording/orchestration into src/agents/prompts/orchestration.ts
- Moved Agent/Skill/Tool/Category types into src/agents/types.ts
- Deleted dynamic-agent-prompt-builder and restored it to the doctor forbidden list
- Tightened runtime enforcement hook to scan current chat flow instead of global historical ledger
- Dropped generic phrase matching ('success') from enforcement checks
- Unused/shadow state-ledger from agents runtime deleted
- Unified state-ledger across agent and core runtime
* Refactor the truth model to prevent unverified bash commands from creating ledger entries and implement flow isolation in runtime enforcement to validate state claims against the current execution flow.
* fix(model-requirements): remove hardcoded gpt-5.3-codex constraints and allow non-gpt models for hephaestus
* fix(model-requirements): prioritize Claude in deep category and cleanup GPT-first bias
* fix: harden tool contract, fix LSP environment, and improve Git detection
- Standardize safety tool results with new shared helper.
- Hardened tool-contract hook for robust metadata parsing.
- Global installation and verification of TypeScript LSP dependencies.
- Added Git worktree detection and upfront startup validation.
- Enhanced doctor checks for system reliability (10/10 Verified).
* fix: standardize all tool metadata and add robust VCS regression tests
- Flattened metadata for plan and query-ledger tools.
- Added programmatic tool contract shape checker.
- Integrated shape checks into doctor.py.
- Added unit and regression tests for VCS detection.
- Verified system stability with 10/10 doctor pass.
* fix: improve Loop Guard recovery flow and UI signal
- Added injectForcedReplan to PlanCompiler to force strategy shifts.
- Updated Semantic Loop Guard to trigger replans and show green toasts.
- Rendered loop guard messages in green in the CLI.
- Added regression tests for recovery flow.
- Updated reliability doctor to verify recovery wiring.
* create conflict on main * feat: enforce workflow scripts and prePR doctors * fix: graceful degradation for unsupported LSP extensions
- Replace heuristic tool blocking with explicit whitelist of 32 essential tools - Add PlanCompilerGuardError custom exception class with detailed error info - Add comprehensive tests covering allowed/blocked tools and fallback scenarios - Add dedicated doctor check for plan compiler guard verification - Fix plan-compiler null safety to prevent crashes on corrupted state The guard now allows read, edit, task, diagnostic, and shell tools during any active step while still blocking genuinely inappropriate tools. This resolves the total deadlock that prevented step execution. Resolves: Plan Compiler Guard blocking all tools including basic read, task, and edit tools
* refactor: hard audit, fix model fallback and test isolation * feat: subagent progress bars and plan compiler guard unlock
* refactor: hard audit, fix model fallback and test isolation * feat: subagent progress bars and plan compiler guard unlock * feat: implement aggregate status tracking and fix model selection persistence
* refactor: hard audit, fix model fallback and test isolation * feat: subagent progress bars and plan compiler guard unlock * feat: implement aggregate status tracking and fix model selection persistence * fix: tool contract regression for safety tools and central normalization
- Added edit-safeguard hook for atomicity and syntax validation - Added EDIT_ATOMICITY doctor check - Fixed corrupted scrape_repos.py - Registered hook and check in system
* refactor: hard audit, fix model fallback and test isolation * feat: subagent progress bars and plan compiler guard unlock * feat: implement aggregate status tracking and fix model selection persistence * fix: tool contract regression for safety tools and central normalization
- Implemented ModelResolutionTracker for centralized tracking - Added sessionModel to resolution priority (UI > Session > Agent > Default) - Eliminated console.log spam in model-resolution-pipeline - Updated all agent builders to support explicit sessionModel inheritance - Added exhaustive tests for priority and tracker logic
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the reliability and determinism of the agent runtime by introducing a suite of new enforcement mechanisms and monitoring tools. It addresses issues of silent terminal states and incomplete task assertions by implementing strict verification contracts, a watchdog for stalled runs, and a robust framework for managing agent execution flow. The changes also include substantial updates to core agent prompts to align with these stricter protocols and new developer workflow tools to ensure code quality and prevent conflicts. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces new agent configurations and prompt definitions for 'Oracle', 'Sisyphus', and 'Hephaestus' agents, along with related test files and utility functions. The 'Oracle' agent is configured as a read-only technical advisor with model-specific prompts. The 'Sisyphus' agent is defined as a powerful orchestrator with extensive behavior instructions, including intent gates, delegation, and recovery mechanisms. The 'Hephaestus' agent is implemented as an autonomous deep worker with model-optimized prompts for GPT-5.4, GPT-5.3 Codex, and generic GPT models, and includes dynamic task/todo discipline sections. Additionally, a new 'Chat' agent is introduced as a conversational assistant with no tool access. The ralph-loop hook, which manages iterative development loops, has been updated to include a 'reset' strategy for creating new sessions and to correctly handle 'ultrawork' verification phases, ensuring that completion promises are detected from session messages or transcripts. Several doctor checks were added for edit atomicity, plan compiler integrity, progress tracking, tool metadata contracts, and run state watchdog. The package.json and bun.lock files were updated to reflect version changes and new dependencies, including playwright. Review comments highlight unresolved git merge conflicts in README.md and .runtime/journal/execution.jsonl, and a typo in the 'Hephaestus' agent description where 'GPT 5.2 Codex' should be 'GPT 5.3 Codex'.
Note: Security Review did not run due to the size of the PR.
| <<<<<<< Updated upstream | ||
| {"timestamp":"2026-03-07T02:19:12.785Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T02:19:12.786Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:19:12.787Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:20:51.663Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T02:20:51.664Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:20:51.664Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:22:43.346Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T02:22:43.346Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:22:43.347Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| ======= | ||
| {"timestamp":"2026-03-07T02:37:05.798Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T02:37:05.799Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:37:05.799Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:37:34.824Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T02:37:34.824Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:37:34.824Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:38:26.876Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T02:38:26.876Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:38:26.876Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:38:58.205Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T02:38:58.205Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:38:58.206Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:39:59.445Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T02:39:59.447Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:39:59.448Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:40:58.573Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T02:40:58.573Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:40:58.574Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:43:19.727Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T02:43:19.728Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:43:19.728Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:44:59.251Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T02:44:59.252Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:44:59.252Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:46:18.780Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T02:46:18.781Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:46:18.781Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:46:34.671Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T02:46:34.671Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:46:34.671Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:53:44.574Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T02:53:44.574Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:53:44.575Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:55:03.040Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T02:55:03.041Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:55:03.041Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:59:04.640Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T02:59:04.640Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T02:59:04.641Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T03:06:11.159Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T03:06:11.223Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T03:06:11.224Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T03:11:12.154Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T03:11:12.155Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T03:11:12.155Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T03:14:01.730Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T03:14:01.731Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T03:14:01.731Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T03:22:02.008Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T03:22:02.008Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T03:22:02.009Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T03:26:08.607Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T03:26:08.607Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T03:26:08.608Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T03:29:59.580Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T03:29:59.580Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T03:29:59.581Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T03:32:39.933Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"bash","args":{"command":"git commit -m 'test'"},"stdout":"Commit created on main"} | ||
| {"timestamp":"2026-03-07T03:32:39.934Z","sessionID":"ses_1","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| {"timestamp":"2026-03-07T03:32:39.935Z","sessionID":"ses_flow_test","agent":"tracked-agent","intent":"execute_tool","tool":"git_safe","stdout":"Success","verificationState":true} | ||
| >>>>>>> Stashed changes |
| <br /> | ||
| Google • Microsoft • Amazon • ELESTYLE • Indent | ||
| </div> | ||
| conflict |
|
|
||
| return { | ||
| description: | ||
| "Autonomous Deep Worker - goal-oriented execution with GPT 5.2 Codex. Explores thoroughly before acting, uses explore/librarian agents for comprehensive context, completes tasks end-to-end. Inspired by AmpCode deep mode. (Hephaestus - OhMyOpenCode)", |
There was a problem hiding this comment.
There's a minor typo in the agent description. It refers to GPT 5.2 Codex, but this file is for GPT 5.3 Codex.
| "Autonomous Deep Worker - goal-oriented execution with GPT 5.2 Codex. Explores thoroughly before acting, uses explore/librarian agents for comprehensive context, completes tasks end-to-end. Inspired by AmpCode deep mode. (Hephaestus - OhMyOpenCode)", | |
| "Autonomous Deep Worker - goal-oriented execution with GPT 5.3 Codex. Explores thoroughly before acting, uses explore/librarian agents for comprehensive context, completes tasks end-to-end. Inspired by AmpCode deep mode. (Hephaestus - OhMyOpenCode)", |
Description
Fixes the bug where tasks were ending with silent/stalled terminal states without outputting final verification text. Also enforces strict completeness logic for tools asserting 'task completed'.
Changes
src/utils/tool-contract-wrapper.ts)RunStateWatchdogManagerto intercept silent post-tool dead zones and emit 'Still working...' UI toasts.messages.transformto natively inject a synthetic 'Task Completed / Stopped' transcript payload if the model terminates without free-text.Testing